Search CORE

3 research outputs found

Applications of topic models

Author: Boyd-Graber Jordan
Hu Yuening
Minmo David
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

Describes recent academic and industrial applications of topic models with the goal of launching a young researcher capable of building their own applications of topic models

CERN Document Server

Rethinking LDA: Why Priors Matter

Author: McCallum Andrew
Minmo David
Wallach Hanna M.
Publication venue: SelectedWorks
Publication date: 01/01/2009
Field of study

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such smoothing parameters have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling

CiteSeerX

ScholarWorks@UMass Amherst

Topic Models for Taxonomies

Author: Bakalov Anton
McCallum Andrew
Minmo David
Wallach Hanna M.
Publication venue: SelectedWorks
Publication date: 01/01/2012
Field of study

Concept taxonomies such as MeSH, the ACM Computing Classification System, and the NY Times Subject Headings are frequently used to help organize data. They typically consist of a set of concept names organized in a hierarchy. However, these names and structure are often not sufficient to fully capture the intended meaning of a taxonomy node, and particularly non-experts may have difficulty navigating and placing data into the taxonomy. This paper introduces two semi-supervised topic models that automatically augment a given taxonomy with many additional keywords by leveraging a corpus of multi-labeled documents. Our experiments show that users find the topics beneficial for taxonomy interpretation, substantially increasing their cataloging accuracy. Furthermore, the models provide a better information rate compared to Labeled LDA

CiteSeerX

Crossref

ScholarWorks@UMass Amherst